Imagine you are the data scientist at a respected media outlet – say the “New York Times”. For the Winter Olympics coverage, your editor-in-chief asks you to analyze some data on the history of Winter Olympics Medals by Year, Country, Event and Gender and prepare some data visualizations in which you outline the main patterns around which to base the story.
Since there is no way that all features of the data can be represented in such a memo, feel free to pick and choose some patterns that would make for a good story – outlining important patterns and presenting them in a visually pleasing way.
The full background and text of the story will be researched by a writer of the magazine – your input should be based on the data and some common sense (i.e. no need to read up on this).
Provide polished plots that are refined enough to include in the magazine with very little further manipulation (already include variable descriptions [if necessary for understanding], titles, source [e.g. “International Olympic Committee”], right color etc.) and are understandable to the average reader of the “New York Times”. The design does not need to be NYTimes-like. Just be consistent.
The main data is provided as an excel sheet, containing the following variables on all participating athletes in all olympics from 1896 to 2016 (sadly, the original source of the data no longer updates beyond that year):
ID: a unique indentifier of the entryName: name of the athleteSex: sex of the athleteAge: age of the athleteHeight: height of the athleteWeight: weight of the athleteTeam: usually the country team of the athlete, with the exception of political accomodations, e.g. the “Refugee Olympic Athletes” team.NOC: national olympic comittee abbreviation.Games: year and season of games.Year: year of gamesSeason: season of games.City: host citySport: a grouping of disciplinesEvent: the particular event / competitionMedal: the particular event / competitionFor example, an event is a competition in a sport or discipline that gives rise to a ranking. Thus Alpine Skiing is the discipline, and Alpine Skiing Women's Downhills is a particular event.
In addition, you are provided with some additional information about the countries in a separate spreadsheet, including the IOC Country Code, Population, and GDP per capita.
athletes_and_events.csv, noc_regions.csv, and gdp_pop.csv. Note, that the noc_regions.csv is the set all NOC regions, while gdp_pop.csv only contains a snapshot of the current set of countries. You have to decide what to do with some countries that competed under different designations in the past (e.g. Germany and Russia) and some defunct countries and whether and how to combine their totals. Make sure to be clear about your decisions here, so that the editor (and potentially a user of your visualizations) understands what you did.## load in data sets
library(readr)
setwd('/Users/nikkigerjarusak/Documents/GitHub/assignment-1---winter-olympics-nikkigsak/data')
athletes_events <- read_csv("athletes_and_events.csv")
## Rows: 271116 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Name, Sex, Team, NOC, Games, Season, City, Sport, Event, Medal
## dbl (5): ID, Age, Height, Weight, Year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
noc_regions <- read_csv("noc_regions.csv")
## Rows: 230 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): NOC, region, notes
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gdp_pop <- read_csv("gdp_pop.csv")
## Rows: 201 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country, Code
## dbl (2): Population, GDP per Capita
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidymodels)
## Registered S3 method overwritten by 'tune':
## method from
## required_pkgs.model_spec parsnip
## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.4 ──
## ✓ broom 0.7.9 ✓ rsample 0.1.1
## ✓ dials 0.0.10 ✓ tibble 3.1.6
## ✓ ggplot2 3.3.5 ✓ tidyr 1.2.0
## ✓ infer 1.0.0 ✓ tune 0.1.6
## ✓ modeldata 0.1.1 ✓ workflows 0.2.4
## ✓ parsnip 0.1.7 ✓ workflowsets 0.1.0
## ✓ purrr 0.3.4 ✓ yardstick 0.0.8
## ✓ recipes 0.1.17
## Warning: package 'tidyr' was built under R version 4.1.2
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## x purrr::discard() masks scales::discard()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x yardstick::spec() masks readr::spec()
## x recipes::step() masks stats::step()
## • Search for functions across packages at https://www.tidymodels.org/find/
library(ggplot2)
## merge datasets
df <- merge(athletes_events, noc_regions, by="NOC") ## merge by NOC
gdp_pop <- gdp_pop %>%
rename(NOC = Code)
df1 <- merge(df, gdp_pop, by="NOC")
## totals by NOC
df1 %>%
group_by(NOC) %>%
summarize(Games = n())
## # A tibble: 198 × 2
## NOC Games
## <chr> <int>
## 1 AFG 126
## 2 AHO 79
## 3 ALB 70
## 4 ALG 551
## 5 AND 169
## 6 ANG 267
## 7 ANT 133
## 8 ARG 3297
## 9 ARM 221
## 10 ARU 42
## # … with 188 more rows
## summary of how many winter games
df1 %>%
group_by(NOC) %>%
filter(Season == 'Winter') %>%
summarize(Season = n())
## # A tibble: 108 × 2
## NOC Season
## <chr> <int>
## 1 AHO 5
## 2 ALB 7
## 3 ALG 12
## 4 AND 116
## 5 ARG 384
## 6 ARM 47
## 7 ASA 2
## 8 AUS 546
## 9 AUT 2190
## 10 AZE 18
## # … with 98 more rows
## winter medals
medal <- df1 %>% filter(!is.na(Medal)) %>%
filter(Season == 'Winter') %>%
group_by(NOC, Medal) %>%
summarize(isMedal = n())
## `summarise()` has grouped output by 'NOC'. You can override using the `.groups` argument.
medalcount <- df1 %>% filter(!is.na(Medal)) %>%
filter(Season == 'Winter') %>%
group_by(NOC, Medal) %>%
summarize(isMedal = n()) %>%
pivot_wider(
names_from = Medal,
values_from = isMedal
)
## `summarise()` has grouped output by 'NOC'. You can override using the `.groups` argument.
medalcount
## # A tibble: 39 × 4
## # Groups: NOC [39]
## NOC Bronze Gold Silver
## <chr> <int> <int> <int>
## 1 AUS 7 6 3
## 2 AUT 103 79 98
## 3 BEL 7 2 4
## 4 BLR 5 6 4
## 5 BUL 3 1 2
## 6 CAN 107 305 199
## 7 CHN 34 16 30
## 8 CRO 1 4 6
## 9 CZE 34 27 12
## 10 DEN NA NA 5
## # … with 29 more rows
## count medals by country won in winter games
medal <- medal %>%
group_by(NOC, Medal) %>%
summarize(Count = sum(isMedal))
## `summarise()` has grouped output by 'NOC'. You can override using the `.groups` argument.
## plot medals won by each country
ggplot(medal, aes(x=NOC, y=Count, fill=Medal)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=c("red","yellow","gray")) +
labs(x = "Countries", y = "Count",
title="Olympic Winter Medals")
Feel free to focus on smaller set of countries (say the top 10), highlight the United States or another country of your choice, consider gender of the medal winners etc. to make the visualization interesting.
Please provide (i) one visualization showing an over time comparison and (ii) one visualization in which a total medal count (across all Winter Olympics) is used. Briefly discuss which visualization you recommend to your editor and why.
Note: Currently, the medal data contains information on each athlete competing, including for team events. For example, in 2014 Russia received 4 gold medals for their men’s win in Bobsleigh Men’s Four alone. Since this is usually not how it is done in official medal statistics, try to wrangle the data so that team events are counted as a single medal.
## over time comparison of genders during winter
count_sex <- df1 %>%
filter(Season =="Winter") %>%
group_by(Year, Sex) %>%
summarize(Athletes = length(unique(ID)))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
count_sex
## # A tibble: 44 × 3
## # Groups: Year [22]
## Year Sex Athletes
## <dbl> <chr> <int>
## 1 1924 F 13
## 2 1924 M 269
## 3 1928 F 27
## 4 1928 M 393
## 5 1932 F 21
## 6 1932 M 221
## 7 1936 F 74
## 8 1936 M 518
## 9 1948 F 71
## 10 1948 M 526
## # … with 34 more rows
## over time comparison graph
ggplot(count_sex, aes(x=Year, y=Athletes, group=Sex, color=Sex)) +
geom_point(size=2) +
geom_line() +
scale_color_manual(values=c("blue","red")) +
labs(x = "Year", y = "Athletes",
title="Male vs. Female Athletes Comparison Over time During Winter Olympics")
winner_sex <- df1 %>%
filter(!is.na(Medal)) %>%
filter(Season == 'Winter') %>%
group_by(Year, Sex) %>%
summarize(Athletes = length(unique(ID)))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
winner_sex
## # A tibble: 44 × 3
## # Groups: Year [22]
## Year Sex Athletes
## <dbl> <chr> <int>
## 1 1924 F 6
## 2 1924 M 109
## 3 1928 F 6
## 4 1928 M 76
## 5 1932 F 6
## 6 1932 M 81
## 7 1936 F 9
## 8 1936 M 86
## 9 1948 F 12
## 10 1948 M 94
## # … with 34 more rows
ggplot(winner_sex, aes(x=Year, y=Athletes, group=Sex, color=Sex)) +
geom_point(size=2) +
geom_line() +
scale_color_manual(values=c("blue","red")) +
labs(x = "Year", y = "Athletes",
title="Male vs. Female Winter Gold Medalists Over time")
I would use the second visualization that shows the comparison of total medal counts across all Winter Olympics between female and male athletes because the lines show a similar trend. I think the graph does a good job of showing that even though female athletes have not been included in the Olympics for as long as men have, the number of medals won by female athletes is increasing in the same manner as the male athletes. The line itself is below the male athletes because historically, there has been less female competitors, but now female athletes are improving at the same rate.
There are different ways to calculate “success”. Consider the following variants and choose one (and make sure your choice is clear in the visualization):
- Just consider gold medals.
- Simply add up the number of medals of different types.
- Create an index in which medals are valued differently. (gold=3, silver=2, bronze=1).
- A reasonable other way that you prefer.
## count of all medals won during Winter Olympics
med_count <- df1 %>%
filter(Season =="Winter") %>%
filter(!is.na(Medal)) %>%
group_by(Country, Medal) %>%
summarize(isMedal = n()) %>%
pivot_wider(
names_from = Medal,
values_from = isMedal
)
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
med_count
## # A tibble: 39 × 4
## # Groups: Country [39]
## Country Bronze Gold Silver
## <chr> <int> <int> <int>
## 1 Australia 7 6 3
## 2 Austria 103 79 98
## 3 Belarus 5 6 4
## 4 Belgium 7 2 4
## 5 Bulgaria 3 1 2
## 6 Canada 107 305 199
## 7 China 34 16 30
## 8 Croatia 1 4 6
## 9 Czech Republic 34 27 12
## 10 Denmark NA NA 5
## # … with 29 more rows
## considering only gold medals won during Winter
gold_med <-df1 %>%
filter(Season =="Winter") %>%
filter(Medal=='Gold') %>%
group_by(NOC) %>%
group_by(Country) %>%
summarize(gold_med_count = n()) %>%
arrange(desc(gold_med_count))
gold_med
## # A tibble: 33 × 2
## Country gold_med_count
## <chr> <int>
## 1 Canada 305
## 2 United States 166
## 3 Germany 153
## 4 Norway 151
## 5 Sweden 125
## 6 Russia 94
## 7 Austria 79
## 8 Switzerland 76
## 9 Finland 66
## 10 Italy 57
## # … with 23 more rows
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## plot gold medals
gmplot <- gold_med %>%
ggplot(aes(x=Country, y=gold_med_count, fill=Country)) +
geom_bar(stat='identity') +
coord_flip() + labs(x="Olympic Year",y="Gold Medals", title = "Winter Gold Medals")
ggplotly(gmplot)
## top 10 times country won gold medal
top10 <- gold_med %>%
ungroup() %>%
slice(1:10)
top10
## # A tibble: 10 × 2
## Country gold_med_count
## <chr> <int>
## 1 Canada 305
## 2 United States 166
## 3 Germany 153
## 4 Norway 151
## 5 Sweden 125
## 6 Russia 94
## 7 Austria 79
## 8 Switzerland 76
## 9 Finland 66
## 10 Italy 57
## plot for top 10 countries who won gold medals
top10plot <- top10 %>%
ggplot(aes(x=Country, y=gold_med_count, fill=Country)) +
geom_bar(stat='identity', position = "stack") +
labs(x="Countries",y="Gold Medals",
title = "Most Winter Gold Medals Won by a Country (Top 10)")
ggplotly(top10plot)
According to both plots with unadjusted rankings, Canada has won the most gold medals the most times out of all of the Winter Olympics.
Now, adjust the ranking of medal success by (a) GDP per capita and (b) population. You have now three rankings: unadjusted ranking, adjusted by GDP per capita, and adjusted by population.
## adjusted by GDP per capita
gdp_gold <-df1 %>%
filter(Season =="Winter") %>%
filter(Medal =="Gold") %>%
group_by(Country, `GDP per Capita`) %>%
summarize(gold_med_count=n_distinct(ID)) %>%
arrange(desc(gold_med_count))
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
## top 10 countries adjusted by gdp
adj_gdp <-gdp_gold %>%
ungroup() %>%
slice(1:10)
## top 10 plot gold medals adjusted by gdp per capita
gdpplot <- ggplot(adj_gdp, aes(x = Country , y =`gold_med_count`/`GDP per Capita`,
fill = Country)) + geom_bar(stat='identity') +
labs(x ="Country", y = "Medals/GDP per Capita",
title = "Success Measure of Gold Medals in the Winter Olympics (Top 10 Adjusted by GDP per Capita)")
ggplotly(gdpplot)
## gold medals adjusted by population
pop_gold <-df1 %>%
filter(Season =="Winter") %>%
filter(Medal =="Gold") %>%
group_by(Country, Population) %>%
summarize(gold_med_count=n_distinct(ID)) %>%
arrange(desc(gold_med_count))
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
## top 10 countries adjusted by pop
adj_pop <-pop_gold %>%
ungroup() %>%
slice(1:10)
popplot <- adj_pop %>%
ggplot(aes(x=Country, y=gold_med_count/Population,
fill= Country)) +
geom_bar(stat='identity', position = "stack") +
labs(x="Country",y="Medal/Population",
title = "Success Measure of Gold Medals in the Winter Olympics (Top 10 Adjusted by Population)")
ggplotly(popplot)
When not adjusted for population or GDP per capita, Canada has won the most gold medals in the Winter Olympics. When the rankings are adjusted for GDP per capita, we can see that Russia has won the most gold medals in the Winter Olympics when accounting for GDP per capita. Finally, when adjusting for population, Norway has won the most gold medals relative to its small size.
Until the 2014 Sochi Winter Olympics (our data for Winter Olympics end here), there were 19 host cities. Calculate whether the host nation had an advantage. That is calculate whether the host country did win more medals when the Winter Olympics was in their country compared to other times.
Note, that the 19 host cities are noted in the data but not the countries they are located in. This happens commonly and often Wikipedia has the kind of additional data you want for the task. To save you some time, here is a quick way to get this kind of table from Wikipedia into R:
library(rvest)
library(stringr)
library(tidyverse)
wiki_hosts <- read_html("https://en.wikipedia.org/wiki/List_of_Olympic_Games_host_cities")
hosts <- html_table(html_nodes(wiki_hosts, "table")[[2]], fill=TRUE)[-1]
hosts %>% filter(Winter != "") %>%
select(City, Country, Year)
## countries that have hosted
host_country <- df1 %>%
filter(Country == "France" |
Country == "Switzerland" |
Country == "United States" |
Country == "Germany" |
Country == "Japan" |
Country == "Italy" |
Country == "Norway" |
Country == "Austria" |
Country == "Yugoslavia" |
Country == "Canada" |
Country == "Russia" |
Country == "South Korea" |
Country == "China")
## medals won by germany
germany_medal <- host_country %>%
filter(Season =="Winter") %>%
filter(Country == "Germany") %>%
group_by(Year) %>%
summarize(Medals = n())
Provide a visualization of the host country advantage (or absence thereof).
## plot for germany
germany_plot <- ggplot(germany_medal, aes(x = Year , y = Medals)) +
geom_bar(aes(fill = (Year == 1936)), stat='identity') +
scale_fill_discrete(name = "Host Year") +
labs(x ="Year", y = "Number of Medals",
title = "Winter Olympic Medals Won by Germany")
ggplotly(germany_plot)
In the graph above, I have selected a host country, Germany, which has hosted the Winter Olympics once in 1936. Here, we can see that during the highlighted host year, the total number of medals won for Germany was 67. This is a drastic jump from the previous Winter Olympics in 1932 where Germany won only 23 medals. Although, that number has increased to as many as 271 medals in recent years even though Germany has not been a host country since. In Germany’s case, being a host country might have provided a slight advantage back when the Winter Olympics was not as large or developed as it is now. However, overall I think being a host country did not provide a notable advantage compared to times where they did not host. I would attribute this increase to the improvement of German athletes and the increase of events/amount of athletes rather than the fact that Germany was not a host country.
## US medals
US_medal <- host_country %>%
filter(Season =="Winter") %>%
filter(Country == "United States") %>%
group_by(Year) %>%
summarize(Medals = n())
## plot for US medals
US_plot <- ggplot(US_medal, aes(x = Year , y = Medals)) +
geom_bar(aes(fill = (Year == 1932 |
Year == 1960 |
Year == 1980 |
Year == 2002)), stat='identity') +
scale_fill_discrete(name = "Host Year") +
labs(x ="Year", y = "Number of Medals",
title = "Winter Olympic Medals Won by USA")
ggplotly(US_plot)
In this plot, the highlighted bars are the years that the United States hosted the Winter Olympics. Again, we can see that there is steady improvement in the number of medals won each Winter Olympics. For example, in 2006 when the US did not host, the team received 325 medals. In the previous Winter Olympics in 2002 when the US was a host country, the team received 313 medals, which is less than the following year when it wasn’t hosting. Therefore, similarly to Germany’s case, we can attribute the increase in medals to the improvement of the athletes and increase of events/participants again.
## Canada medals
canada_medal <- host_country %>%
filter(Season =="Winter") %>%
filter(Country == "Canada") %>%
group_by(Year) %>%
summarize(Medals = n())
## plot for Canada medals
canada_plot <- ggplot(canada_medal, aes(x = Year , y = Medals)) +
geom_bar(aes(fill = (Year == 1988 |
Year == 2010)), stat='identity') +
scale_fill_discrete(name = "Host Year") +
labs(x ="Year", y = "Number of Medals",
title = "Winter Olympic Medals Won by Canada")
ggplotly(canada_plot)
In this plot above, we see the number of medals won by Canada during the Winter Olympics. Canada’s case is different from the US and Germany. In looking at the year 1988 when Canada hosted the Winter Olympics for the first time, we can see that the team won 190 medals. This is a large jump from 1984 where Canada did not host and won 91 medals. Additionally, in 1992 where Canada also did not host, we see a decrease from 190 medals to 186 medals. Subsequently in 1994, the number of medals decreased even more to 149 medals. In Canada’s case, in that span of time, being a host country did provide a slight advantage in the total number of medals won at the Olympic games. However, again, I do believe the most dominant effect in increasing the amount of medals won is the improvement of the athletes and the augmentation of the Winter Olympics as a whole.
## most successful winter athletes
athletes <-df1 %>%
filter(Season =="Winter") %>%
filter(Medal =="Gold") %>%
group_by(Name, Medal, Country) %>%
summarize(gold_med_count=n()) %>%
arrange(desc(gold_med_count))
## `summarise()` has grouped output by 'Name', 'Medal'. You can override using the `.groups` argument.
## top 10 athletes
athletes10 <- athletes %>%
ungroup() %>%
slice(1:10)
## plot top 10 athletes
athletes10_plot <- ggplot(athletes10, aes(x = gold_med_count, y = Name,
fill = Country)) +
labs(x ="Number of Gold Medals", y ="Athlete",
title = "Most Successful Winter Olympic Athletes (Top 10)") +
geom_point()
ggplotly(athletes10_plot)
## success by gender
ath_gend <- df1 %>%
filter(Season == "Winter") %>%
filter(Medal == "Gold") %>%
group_by(Name, Sex) %>%
summarize(gold_med_count = n()) %>%
arrange(desc(gold_med_count))
## `summarise()` has grouped output by 'Name'. You can override using the `.groups` argument.
top_gend <- ath_gend %>%
ungroup() %>%
slice(1:20)
gend_plot <- ggplot(top_gend, aes(x = gold_med_count, y = Name, fill = Sex)) +
labs(x ="Number of Gold Medals", y ="Athlete",
title = "Top 10 Winter Olympic Gold Medalists by Gender") +
geom_point()
ggplotly(gend_plot)
From this plot, we can see that the most winningest Winter Olympic athlete is Ole Einar Bjrndalen who is male (8 gold medals). The second place is tied by both a male and female athletes at 6 gold medals.
Choose 2 of the plots you created above and add interactivity. One of the plots needs to be written in plotly rather than just using the ggplotly automation. Briefly describe to the editor why interactivity in these visualization is particularly helpful for a reader.
## plotly of Most Winter Gold Medals Won by a Country (Top 10)
plot_ly(top10, y = ~gold_med_count, x = ~Country,
color= ~Country,
type = "bar", mode='point')
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
This plot shows the top 10 gold medal winners in the Winter Olympics. Plotly is helpful in this situation because you can see the exact total amount of gold medals won by hovering over each bar. I also used ggplotly for this graph previously, but I think that plotly’s aesthetics/visualization are more pleasant than ggplotly’s theme.
## ggplotly of Male vs. Female Winter Gold Medalists Over time
ath_plot <- ggplot(winner_sex, aes(x=Year, y=Athletes, group=Sex, color=Sex)) +
geom_point(size=2) +
geom_line() +
scale_color_manual(values=c("blue","red")) +
labs(x = "Year", y = "Athletes",
title="Male vs. Female Winter Gold Medalists Over time")
ggplotly(ath_plot)
This plot shows the comparison between male and female winter gold medalists over time. Again, here ggplotly is helpful in pinpointing the exact the year, scrolling over, and being able to see the exact number of gold medalists on each line. The graph itself is very simple and easy to discern the trend without being too crowded with unnecessary data points.
Prepare a selected data set and add a datatable to the output. Make sure the columns are clearly labelled. Select the appropriate options for the data table (e.g. search bar, sorting, column filters etc.). Suggest to the editor which kind of information you would like to provide in a data table in the online version of the article and why.
library(DT)
## choosing only USA because dataset too large
data_table <- df1 %>%
filter(NOC == 'USA') %>%
select(Year, Sport, Name, Medal)
datatable(
data_table, colnames = c('Year','Sport', 'Name', 'Medal'),
filter = list(position = 'top', clear = FALSE),
options = list(
search = list(search = ''),
pageLength = 10
)
)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
The data comes in a reasonably clean Excel data set. If needed for your visualization, you can add visual drapery like flag icons, icons for sports, icons for medals etc. but your are certainly not obligated to do that.
Part of the your task will be transforming the dataset into a shape that allows you to plot what you want in ggplot2. For some plots, you will necessarily need to be selective in what to include and what to leave out.
Make sure to use at least three different types of graphs, e.g. line graphs, scatter, histograms, bar chats, dot plots, heat maps etc.
Please follow the instructions to submit your homework. The homework is due on Wednesday, February 16 at 5pm
Yes, the medal counts of the olympics have surely been analyzed before. If you do come across something, please no wholesale copying of other ideas. We are trying to practice and evaluate your abilities in using ggplot2 and data visualization not the ability to do internet searches. Also, this is an individually assigned exercise – please keep your solution to yourself.